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Abstract 

The Matthew effect refers to the adage written some two-thousand years ago in the Gospel of St. 
Matthew: "For to all those who have, more will be given". Even two millennia later, this idiom is used 
by sociologists to qualitatively describe individual progress and the interplay between status and reward. 
Quantitative studies of professional careers are traditionally limited by the difficulty in measuring progress 
and the lack of data on individual careers. However, in some professions, there are well-defined metrics that 
quantify career longevity, success, and prowess, which together contribute to the overall success rating for 
an individual employee. Here we demonstrate testable evidence, inherent in the remarkable statistical reg- 
ularity of career longevity distributions, of the age-old Matthew "rich get richer" effect, in which longevity 
and past success lead to contemporaneous competitive advantage. We develop an exactly solvable stochas- 
tic model that quantitatively incorporates the Matthew effect such that it can be validated in competitive 
professions. These results demonstrate that statistical laws can exist at even the microscopic social level, 
where the collective behavior of individuals can lead to emergent phenomena. We test our model on the 
careers of 400, 000 scientists using data from six high-impact journals. We further confirm our findings by 
testing the model on the careers of more than 20, 000 athletes in four sports leagues. 
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The rate of individual progress is fundamental to career development and success. In practice, 
the rate of progress depends on many random factors. Interestingly, we find that the relatively 
small rate of progress at the beginning of the career plays a crucial role in the evolution of the ca- 
reer length. A direct result of the increasing progress rate with career position is the large disparity 
in the numbers of successful long tenures from unsuccessful short stints. Here, we seek to describe 
career progression with a simple model that relies on two fundamental ingredients: (i) random for- 
ward progress "up the success ladder", and (ii) random stopping times, terminating a career. This 
model quantifies the "Matthew effect" by incorporating the everyday property that it is easier to 
move forward in the career the further along is one in the career. We test this model for both 
scientific and sports careers, two careers where accomplishments are methodically recorded. We 
analyze publication careers within six high-impact journals: Nature, Science, the Proceedings of 
the National Academy of Science (PNAS), Physical Review Letters (PRL), New England Journal 
of Medicine (NEJM) and CELL. We also analyze sports careers within four distinct leagues: Ma- 
jor League Baseball (MLB), Korean Professional Baseball, the National Basketball Association 
(NBA), and the English Premier League. 

Career longevity is the fundamental metric that determines the overall legacy of an employee 
because other measurable contributions are related to the career length. Common experience in 
most professions indicates that time is required for colleagues to gain faith in a newcomer's abili- 
ties. Qualitatively, the acquisition of new opportunities mimics a standard positive feedback mech- 
anism (known in various fields as Malthusiangrowth, preferential attachment, the ratchet effect, 
and the Matthew "rich get richer" effect tlil2l, |3fl), which endows greater rewards to individuals 
who are more accomplished than to individuals who are less accomplished. 

In this paper we study the everyday topic of career longevity, and reveal surprising complexity 
arising from competition within social environments. We develop an exactly solvable stochastic 
model, which predicts the functional form of the probability density function (pdf) P(x) of career 
longevity x in competitive professions. The underlying stochastic process depends on only two 
parameters, a and x c . The first parameter, a, represents the power-law exponent that emerges from 
the pdf of career longevity. This parameter is intrinsically related to the rate early in the career 
during which professionals establish their reputations and secure future opportunities. The second 
parameter, x c , is a time scale which distinguishes newcomers from veterans. 
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I. QUANTITATIVE MODEL 



In this model, every employee begins his/her career with approximately zero credibility, and 
must labor through a common learning curve. At each point in a career, there is an opportunity for 
progress as well as the possibility for no progress. An opportunity can refer to a day at work or, 
even more generally, to any assignment given by an employing body. As a first step, we postulate 
that the stochastic process governing longevity is similar to a Poisson process, where progress is 
made at any given step with some approximate probability. Each step forward contributes to the 
employee's resume and reputation. Hence, we refine the process to be a spatial Poisson process, 
where the probability of progress g(x) depends explicitly on the employee's position x within the 
career. Career longevity is then defined as the final location along the career ladder at the time of 
retirement. 

Employees begin their career with their first opportunity x = 1, and make random forward 
progress through time to career position x > 1. Let P(x,t) be the probability that at time t 
an individual is at career position x. Because the progress rate g(x) depends only on x, P(x, t) 
assumes the familiar Poisson form, but with the insertion of g(x) as the rate parameter, 

(x-l)\ ' {) 
We derive the spatial Poisson distribution P(x, t) in the Appendix. 

According to the Matthew effect, it becomes easier for an individual to excel with increased 
success and reputation. Hence, the choice of g(x) should reflect the fact that newcomers, lacking 
the familiarity of their peers, have a more difficult time moving forward, while seasoned veterans, 
following from their experience and reputation, often have an easier time moving forward. For 
this reason we choose the progress rate g(x) to have the functional form, 

g{x) = 1 - exp[-(x/x c ) a ] . (2) 

This function exhibits the fundamental feature of increasing from approximately zero and asymp- 
totically approaching unity over some time interval x c . Furthermore, g(x) ~ x a for small 
x « x c . In Fig. 1, we plot g(x) for several values of a, with fixed x c = 10 3 in arbitrary 
units. We will show that the parameter a is the same as the power-law exponent a in the pdf of 
career longevity P(x) (Fig. 1 inset). The random process for forward progress can also be recast 
into the form of random waiting times, where the average waiting time (uj(x)) between successive 
steps is the inverse of the forward progress probability, (uj(x)) = 1/ g(x). 
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We now address the fact that not every career is of the same length. Nearly every individual is 
faced with the constant risk of losing his/her job, possibly as the result of poor performance, bad 
health, economic downturn, or even a change in the business strategy of his/her employer. Survival 
in the workplace requires that the individual maintain his/her performance level with respect to all 
possible replacements. In general, career longevity is influenced by many competing random 
processes which contribute to the random termination time T of a career [4]. The distribution 
P(x,t) calculated in Eq. [Q]] is the conditional probability P(x,T) = P(x, t\t = T) that an 
individual has achieved a career position x at his/her termination time t = T. Hence, to obtain 
an ensemble distribution of career longevity P(x) we must average over the distribution r(T) of 
random termination times T, 
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P(x) = / P(x,T)r(T)dT . (3) 



o 

We next make a suitable choice for r(T). To this end, we introduce the hazard rate, H(T), which 
is the Bayesian probability that failure will occur at time T + ST, given that it has not yet occurred 
at time T. This is written as H{T) = r(T)/S(T) = lnS(T) , where S(T) = 1 - / Q T r(t) dt 
is the probability of a career surviving until time T. The exponential pdf of termination times, 

r(T) = x c - 1 exp[-(T/x c )] , (4) 

has a constant hazard rate H(T) = and thus assumes that hazards are equally distributed over 
time in competitive professions. Substituting Eq. 01 into Eq. O, we obtain 

q(%\ X ^ j x 

P{x) = ' . ,. « — — e'^^ . (5) 

x c {— + 9{x)) x g[x)x c 

To incorporate a non-constant H(T) one can use a more general Weibull distribution for the pdf 
of termination times, r(T) = ^-(^) 7 ~ 1 exp[— (f-) 7 ], where 7 = 1 corresponds to the exponential 
case [5]. In general, the hazard rate of the Weibull distribution is H(T) oc T 7_1 , where 7 > 1 
corresponds to an increasing hazard rate, and 7 < 1 corresponds to a decreasing hazard rate. 

From the curves plotted in the inset of Fig. 1, one observes that a c = 1 is a special crossover 
value for P(x), between a bimodal P(x) for a > 1, and a monotonically decreasing P(x) for a < 
1. This crossover is due to the small x behavior of the progress rate g(x) ~ x a for x < x c , which 
serves as a "potential barrier" that a young career must overcome. The width x w of the potential 
barrier, defined such that g(x w ) = l/x c , scales as x w /x c ~ Xc l ^ a - Hence, the value a c = 1 
separates convex progress (a > 1) from concave progress (a < 1) in early career development. 
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FIG. 1: A demonstration of the fundamental relationship between the progress rate g(x) and the career 
longevity pdf P(x). The progress rate g(x) represents the probability of moving forward in the career 
to position x + 1 from position x. The small value of g(x) for small x captures the difficulty in making 
progress at the beginning of a career. The progress rate increases with career position x, capturing the role 
of the Matthew effect. We plot five g(x) curves with fixed x c = 10 3 and different values of the parameter 
a. The parameter a emerges from the small-x behavior in g(x) as the power-law exponent in P(x). (Inset) 
Probability density functions P(x) resulting from inserting g(x) with varying a into Eq. Q. The value 
a c = 1 separates two distinct types of longevity distributions. The distributions resulting from concave 
career development a < 1 exhibit monotonic statistical regularity over the entire range, with an analytic 
form approximated by the gamma distribution T(x; a, x c ). The distributions resulting from convex career 
development a > 1 exhibit bimodal behavior. One class of careers is stunted by the difficulty in making 
progress at the beginning of the career, analogous to a "potential" barrier . The second class of careers 
forges beyond the barrier and is approximately centered around the crossover x c on a log-scale. 

In the case a > 1, one class of careers is stunted by the barrier, while the other class of careers 
excels, resulting in a bimodal P(x). In the case a < 1, it is relatively easy to begin the career, 
resulting in a remarkable statistical regularity which bridges the gap between very short and very 
long careers. This statistical regularity for a < 1 can be approximated in two regimes, 




(6) 



It has been shown [6] that random stopping times can explain power law behavior in many stochas- 
tic systems that arise in the natural and social sciences, with predicted exponent values a > 1. Our 
results provide a mechanism which describes systems with a < 1. Moreover, our model provides 
a quantitative meaning for the power-law exponent a characterizing the distribution of career 
longevity. 

II. EMPIRICAL EVIDENCE 

The two essential ingredients of our stochastic model, namely random forward progress and 
random termination times, are general and should apply in principle to many competitive pro- 
fessions. The individuals, some who are championed as legends and stars, are judged by their 
performances, usually on the basis of measurable metrics for longevity and success, which vary 
between professions. 

In scientific arenas, and in general, the metric for career position is difficult to define, even 
though there are many conceivable metrics for career longevity and success [7, 8, 9fl. We com- 
pare author longevity within individual journals, which mimic an arena for competition, each with 
established review standards that are related to the journal quality. As a first approximation, the 
career longevity with respect to a particular journal can be roughly measured as the duration be- 
tween an author's first and last paper in that journal, reflecting his/her ability to produce at the top 
tiers of science. This metric for longevity should not be confused with the career length of the 
scientist, which is probably longer than the career longevity within any particular journal. Follow- 
ing standard lifetime data analysis methods 110(1 . we collect "completed" careers from our data set 
which begins at year Y and ends at year Yf. For each scientific career i, we calculate (Arj), the 
average time between publications in the particular journal. A journal career which begins with 
a publication in year y i;0 and ends with a publication in year y it f is considered "complete", if the 
following two criteria are met: (a) yij < Yf — (ATj) and (b) y ii0 > Y + (Arj). These criteria 
eliminate from our analysis incomplete careers which possibly began before Y or ended after Yf. 
We then estimate the career length within journal j as j = y i} f — y i + 1, with a year allotted 
for publication time, and do not consider careers with y iy f = y i . This reduces the size of each 
journal data set by approximately 25% (see Table S 1 and the Supporting Information (SI) text for a 
description of data and methods). In [1 1] we further analyze the scientific careers of the authors in 
these six journal data bases, developing normalized metrics for career success (citation "shares") 
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FIG. 2: Highly right-skewed pdfs P(x) of career longevity in several high-impact scientific journals and 
several major sports leagues. We analyze data from American baseball (Major League Baseball) over the 
84-year period 1920-2004, Korean Baseball (Korean Professional Baseball League) over the 25-year period 
1982-2007, American basketball (National Basketball Association and American Basketball Association) 
over the 56-year period 1946-2004, and English soccer (Premier League) over the 15-year period 1992- 
2007, and several scientific journals over the 42-year period 1958-2000. Solid curves represent best-fit 
functions corresponding to Eq. [5]. (A) Baseball fielder longevity measured in at-bats (pitchers excluded): 
we find a « 0.77 , x c « 2500 (Korea) and x c 5000 (USA). (B) Basketball longevity measured in minutes 
played: we find a « 0.63, x c w 21000 minutes. (C) Baseball pitcher longevity measured in innings-pitched 
measured in outs (IPO): we find a w 0.71 , x c 2800 (Korea), and x c « 3400 (USA). (D) Soccer longevity 
measured in games played: we find a 0.55 , x c « 140 games. (E and F) High-impact journals exhibit 
similar longevity distributions for the "journal career length" which we define as the duration between 
an author's first and last paper in a particular journal. Deviations occur for long careers due to data set 
limitations, (for comparison, least-square fits are plotted in panel (E) with parameters a = 0.40, x c = 9 
years and in panel (F) with parameters a = 0.10, x c = 11 years). These statistics are summarized in Table 
S2 of the SI. 7 



and productivity (papers "shares"). 

In athletic arenas, the metrics for career position, success and success rate are more easy to 
define Jl2|]. In general, a career position in sports can be measured by the cumulative number of 
in-game opportunities a player has obtained. In baseball, we define an opportunity as an "at bat" 
(AB) for batters, and an "inning pitched in outs " (IPO) for pitchers, while in basketball and soccer, 
we define the metrics for opportunity as "minutes played" and "games played ", respectively. 

In Fig. 2 we plot the distributions of career longevity for 20, 000 professional athletes in four 
distinct leagues and roughly 400, 000 scientific careers in six distinct journals (data is publicly 
available at 1 13, [3]). We observe universal statistical regularity corresponding to a < 1 in the 
career longevity distributions for three distinct sports and several high-impact journals (see Table 
S2 for a summary of best-fit parameters). The disparity in career lengths indicates that it is very 
difficult to sustain a competitive professional career, with most individuals making their debut and 
finale over a relatively short time interval. The exponential cutoff in P(x) that follows after the 
crossover value x c , arises from the finite human lifetime, and is reminiscent of any real system 
where there are finite-size effects that dominate the asymptotic behavior. The scaling regime is 
less pronounced in the curves for journal longevity. This results from the granularity of our data 
set, which records publications by year only. A finer time resolution (e.g. months between first 
and last publication) would reveal a larger scaling regime. However, regardless of the scale, one 
observes the salient feature of there being a large disparity between the frequency of long and short 
careers. 

In science, an author's success metric can be quantified by the total number of papers or ci- 
tations in a particular journal. Publication careers have the important property that the impact of 
scientific work is time dependent. Where many papers become outdated as the scientific body 
of knowledge grows, there are instances where "late-blooming" papers make significant impact a 
considerable time after publication II 1511 . In 111 ill we find that the pdf of total number of normalized 
citations for a particular author in a single journal over his/her entire career follows an inverse 
cubic law P(z)dz ~ z~ 3 dz. 

In sports, however, career accomplishments do not wax or wane with time. In Fig. 3 we plot 
the pdf P(z) of career success z for common metrics in baseball and basketball. Remarkeably, the 
power-law regime for P(z) is governed by a scaling exponent which is approximatlely equal to the 
scaling exponent of the longevity pdf P(x). In the SI, we show analytically that the pdf P(z) of 
career success z follows directly from a simple Mellin convolution of the pdf P(x) for longevity 
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FIG. 3: Probability density function P(z) of common metrics for career success, z. Solid curves represent 
best-fit functions corresponding to Eq. [5]. (A) Career batting statistics in American baseball: x^ tts 
1200, x^ BI w 600, (RBI = Runs Batted In). (B) Career statistics in American basketball: x Boints « 8000, 
x Rebounds ^ 3500. For clarity, the top set of data in each plot has been multiplied by a constant factor of 
four in order to separate overlapping data. 

x and the pdf P(y) of prowess y. 

The gamma distribution P(x) = T(x; a, x c ) oc x~ a e~ x ' Xc is commonly employed in statistical 
modeling, and can be used as an approximate form of Eq. Q. One advantage to the gamma 
function is that it can be inverted in order to study extreme statistics corresponding to rare stellar 
careers. In the SI, we further analyze the relationship between the extreme statistics of the gamma 
distribution and the selection processes for Hall of Fame museums. In general, the statistical 
regularity of these distributions allows one to establish robust milestones, which could be used for 
setting the corresponding financial rewards and pay scales, within a particular profession. 
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In summary, a wealth of data recording various facets of social phenomena have become avail- 
able in recent years, allowing scientists to search for universal laws that emerge from human inter- 
actions I16Q . Theoretical models of social dynamics, employing methods from statistical physics, 
have provided significant insight into the various mechanisms that can lead to emergent phenom- 
ena [17]. An important lesson from complex system theory is that oftentimes the details of the 
underlying mechanism do not affect the macroscopic emergent phenomena. For baseball play- 
ers in Korea and the United States, we observe remarkable similarity between the pdfs of career 
longevity (Fig. 2) and the pdfs of prowess (Fig. SI), despite these players belonging to completely 
distinct leagues. This fact is consistent with the hypothesis that universal stochastic forces govern 
career development in science, professional sports and presumably in a large class of competitive 
professions. 

In this paper we demonstrate strong empirical evidence for universal statistical laws that de- 



scribe career progress in competitive professions. Universal 
many other social complex systems [| 1 8L 
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power-law behavior also occurs in 



24, 



25, 



260. Stemming from the 



simplicity of the assumptions, the mechanism developed in this paper could apply elsewhere in 
society, such as the duration of both platonic and romantic friendships. Indeed, long relationships 
are harder to break than short ones, with random factors inevitably terminating them forever. Also, 
supporting evidence for the applicability of this model can be found in the similar truncated power- 
law pdfs with a < 1, that describe the dynamics of connecting within online social networks [26]. 
We thank G. Viswanathan, G. Paul, F. Wang, and J. Tenenbaum for helpful comments. 



III. APPENDIX: THE SPATIAL POISSON DISTRIBUTION 

The master equation for the evolution of P(x, N), with appropriate initial conditions is 

P(x + 1,N + 1) - P(x + 1,N) 
= f(x)P(x, N) - f{x + l)P(x + 1,N), (7) 

with initial condition, 

P(x + 1,0) = 6 X , Q , (8) 

where f(x) represents the probability that an employee obtains another future opportunity given 
his/her resume at career position x. We next write the discrete-time discrete-space master equation 
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in the continuous -time discrete-space form, 

d P(a ^ M) = g(x)P(x, t) - g(x + l)P(x + l,t), (9) 

where q(x) — f(x)/6t and t = NSt (for an extensive discussion of master equation formalism 
see Ref. [270). Taking the Laplace transform of both sides one obtains, 

sP(x + l,s) - P(x + l,t = 0) = 

g{x)P{x,s) -g(x+ l)P(x + 1, s) . (10) 

From the initial condition in Eq. [8l we see that the second term above vanishes for x > 1. Solving 
for P(x + 1, s) we obtain the recurrence equation 

P( x + l,s) = —°P—P(x,s). (11) 
s + g(x + 1) 

If the first derivative 4-g{x) is not too large, we can replace g(x + 1) with g{x) in the equation 
above. Then, one can verify the ansatz 

P{x,s)= , 9 ^ X . ,, , (12) 



which is the Laplace transform of the spatial Poisson distribution P(x,t; X = g(x)) (D28Q). As 
usual, the Laplace transform is defined as L{f(t)} = f(s) = J °° dtf(t)e~ st . Inverting the trans- 
form we obtain 
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Supporting Information 



I. DATA EXTRACTION AND METHODS 

The data analyzed in this paper was downloaded from ISI Web of Knowledge in May 2009. We 
restrict our analysis to publications termed as "Articles", which excludes reviews, letters to editor, 
corrections, etc. Each article summary includes a field for the author identification consisting of a 
last name and first and middle initial (eg. the author name John M. Doe would be stored as "Doe, 
J" or "Doe, JM" depending on the author's designation). From these fields, we collect the career 
works of individual authors within a particular journal together, and analyze metrics for career 
longevity and success. 

For author i we combine all articles in journal j for which he/she was listed as coauthor. The 
total number of papers for author i in journal j over the 50-year period is n(i,j). Following 
standard methods in lifetime statistics [1], we use a standard method to isolate "completed" 
careers from our data set which begins at year Y and ends at year Yj. For each author i, 
we calculate (An), the average time An between successive publications in a particular 
journal. A career which begins with the first recorded publication in year y i>0 and ends with the fi- 
nal recorded publication in year y, L j is considered "complete", if the following two criteria are met: 

(i) Vij <Yf- (An) and 
(H) Vi,o > Y + (At,). 

We then estimate the career length within journal j as L it j = y it f — y i}0 + 1, and do not consider 
careers with y { j = y i>0 . This reduces the size of the data set by approximately 25% (compare the 
raw data set sizes N to the pruned data set size iV* in Table ISlT). 

There are some potential sources of systematic error in the use of this database: 

• i) Degenerate names — ► increases career totals. Radicchi et al. |2|] observe that this method 
of concatenated author ID leads to a pdf P(d) of degeneracy d, P(d) ~ d~ 3 . 

• ii) Authors using middle initials in some but not all instances of publication — > decreases 
career totals. 
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• iii) A mid-career change of last name — > decreases career totals. 

• iv) Sampling bias due to finite time period. Recent young careers are biased toward short 
careers. Long careers located towards the beginning Y or end Yf of the database are biased 
towards short careers. 

II. A ROBUST METHOD FOR CLASSIFYING CAREERS 

Professional sports leagues are geared around annual championships that celebrate the accom- 
plishments of teams over a whole season. On a player level, professional sports leagues annually 
induct retired players into "halls of fame" in order to celebrate and honor stellar careers. Induction 
immediately secures an eternal legacy for those that are chosen. However, there is no standard 
method for inducting players into a Hall of Fame, with subjective and political factors affecting 
the induction process. 

In this section we propose a generic and robust method for measuring careers. We find that the 
pdf for career longevity can be approximated by the gamma distribution, 

P(x)dx = - — — -dx , (SI) 

with moments (x n ) = x™ ^(i-a)"^ > wnere we restrict our considerations to the case of a < 1, with 
x c >> 1. This distribution allows us to calculate the extreme value x* such that only a certain 
fraction / of players exceed this value with respect to the distribution P(x), 

f= / -dx = —, !JE<lL = QM - a , — , (S2) 

J J x , xi" a r(l-a) r(l-a) L xj ' V 

where Q[l — a, ^-} is the regularized gamma function. This function can be easily inverted 
numerically using computer packages, e.g. Mathematica, with the result x* = x c Q~ l [\ — a, /]. 

In Table |S2l we provide x* with respect to career longevity and career metrics for several sports 
using the value of / corresponding to the American Baseball Hall of Fame in Cooperstown, 
NY USA. This hall of fame has inducted 276 players out of the 14,644 players that exist in 
Sean Lahman's baseball database between the years 1879-2002. This corresponds to a fraction 
/ = 0.019. It is interesting to note that the last column, y- = (3 ~ 3.87 for all the gamma 
distributions analyzed. Thus, this value provides a robust method for determining if a players 
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career is fame worthy, independent of sport or country. The highly celebrated milestone of 
3,000 hits in baseball corresponds to the value of 1.26 f3oh- Only 27 players have exceeded 
this benchmark in their professional careers, while only 86 have exceeded the arbitrary 2,500 
benchmark. Hence, it makes sense to set the benchmark for all milestones at a value of x* = (3a x 
corresponding to each distribution of career metrics. 

We check for consistency by comparing the cutoff value x* calculated using the gamma dis- 
tribution with the value x* d derived from the database of career statistics. Referring to the actual 
set of all baseball players from 1871-2006, to achieve a fame value fd ~ 0.019 with respect to 
hits, one should set the statistical benchmark at x* d ~ 2250, which account for 146 players (this 
assumes that approximately half of all baseball players are not pitchers, who we exclude from 
this calculation of fd). The value of x d « 2250 agrees well with the value calculated from the 
gamma distribution, x* ~ 2366. Of these 146 players with career hit tallies greater than 2250, 126 
players have been eligible for at least one induction round, and 82 of these players have been suc- 
cessfully inducted into the American baseball hall of fame. Thus, a player with a career hit tally 
above x* ~ x* d has a 65% chance of being accepted, based on just those merits alone. Repeating 
the same procedure for career strikeouts obtained by pitchers in baseball we obtain the milestone 
value x* d ~ 1525 strikeouts, and for career points in basketball we obtain the value x* d ~ 16, 300 
points. Nevertheless, the overall career must be taken into account, which raises the bar, and ac- 
counts for the less than perfect success rate of being voted into a hall of fame, given that a player 
has had a statistically stellar career in one statistical category. 

IE. CAREER METRICS 

In Fig. 4 we plot common career metrics for success in American baseball and American bas- 
ketball. Note that the exponent a for the pdf P(z) of total career successes z is approximately 
equal to the exponent a for the pdf P(x) of career longevity x (see Table |S2|). In this section, 
we provide a simple explanation for the similarity between the power law exponent for career 
longevity (Fig. 2) and the power law exponent for career success (Fig. 4). 

Consider a distribution of longevity that is power law distributed, P(x) ~ x~ a for the entire 
range 1 < x < x c < oo. The cutoff x c represents the finiteness of human longevity, accounted 
for by the exponential decay in Eq. [6]. Also, assume that the prowess y has a pdf P(y) which 
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is characterized by a mean and standard deviation, which represent the talent level among pro- 
fessionals (see Ref. [3] for the corresponding prowess distributions in major league baseball). In 
the first possible case, the distribution is right-skewed and approximately exponential (as in the 
case of home-runs). In other cases, the distributions are essentially Gaussian. Regardless of the 
distribution type, the prowess pdfs P(y) are confined to the domain 5 < y < 1, where 5 > 0. 

Assume that in any given appearance, a person can apply his/her natural prowess towards 
achieving a success, independent of past success. Although prowess is refined over time, this 
should not substantially alter our demonstration. Since not all professionals have the same career 
length, the career totals are in fact a combination of these two distributions as in their product. 
Then the career success total z = xy has the distribution, 



P(z = xy) = J J dy dx P{y)P{x)5{xy — z) 




dy dxP(y)P(x)5(x(y — z/x)) 
= [ dx P(-)P(x)- . 

J XX 

This integral has three domains (Ref. S[4]), 



(S3) 



P(z) oc < 



J^ 5 dx P(^)x-^ , 5<z< 1 
j z z/S dx P(|) X -( Q+1 ) , 1< z < x c 5 
f* c dx P(^)aH Q+1 ) , x c 5 < z < x 



The first regime 5 < z < 1 is irrelevant, and is not observed since z is discrete. For the first case 
of an exponentially distributed prowess, 



P(z) oc 



exp(— z/Xx c 



1 < z < x c 6 

X r S < Z < X r . 



(S4) 



In Ref. Q3|] we mainly observe the exponential tail in the home-run distribution, as the above 
form suggests in the regime x c 5 < z < x c , resulting from 5 ~ for the right-skewed home- 
run prowess distribution. However, in the case for a normally distributed prowess, the power 
law behavior of the longevity distribution is maintained for large values into the career success 
distribution P(z), as x c 5 > 10 3 . 



P{z) oc 



1 < z < x c 5 

X r S < Z < X r 



(S5) 
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Thus, the main result of this demonstration is that the distribution P(z) maintains the power law 
exponent a of the career- longevity distribution, P(x), when the prowess is distributed with a char- 
acteristic mean and standard deviation. This result is also demonstrated with the simplification of 
representing the prowess distribution P(y) as an essentially uniform distribution over a reasonable 
domain of y, which simplifies the integral in Eq. (IS3 I ) while maintaining the inherent power law 
structure. 

In Fig.|ST]we plot the prowess distributions that correspond to the career success distributions 
plotted in Fig. 4. It is interesting that the competition level based on the distributions of prowess 
indicates that Korean and American baseball are nearly equivalent. Also, note that the prowess 
distributions for rebounds per minute are bimodal, as the positions of players in basketball are 
more specialized. 




Prowess (rate of success per appearance) 



FIG. SI: Probability density functions of seasonal prowess for several career metrics. Each pdf is normally 
distributed, except for the bimodal curve for rebound prowess, NBA (Reb.). The bimodal distribution for 
Rebound prowess reflects the specialization in player positions in the sport of basketball. Furthermore, 
note the remarkable similarity in the distributions between American (MLB) and Korean (KBB) baseball 
players. 
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TABLE SI: Summary of data sets for each journal. Total number N of unique (but possibly degener- 
ate) name identifications. N* is the number of unique name identifications after pruning the data set of 
incomplete careers. 



Journal 


Years 


Articles 


Authors, N 


N* 


Nature 


1958-2008 


65,709 


130,596 


94,221 


Science 


1958-2008 


48,169 


109,519 


82,181 


PNAS 


1958-2008 


84,520 


182,761 


118,757 


PRL 


1958-2008 


85,316 


112,660 


72,102 


CELL 


1974-2008 


11,078 


31,918 


23,060 


NEJM 


1958-2008 


17,088 


66,834 


49,341 
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TABLE S2: Data summary for the pdfs of career statistical metrics. The values a and x c are determined 
via least-squares method applied to the career data. The distribution moments (x), a x and fame crossover 
value x* are extracted from the corresponding gamma distribution. The units for the metric x are indicated 
in parenthesis alongside the league in the first column. For publication distributions, the career longevity 
metric x is measured in years. 



Least-square values gamma distribution values 

Professional League, 



(success metric) 


a x c 


(x) 


° x X * fx) a x 


MLB, (H) 


0.76 ± 0.02 1240 ± 150 


303 


612 2366 7.807 3.864 


MLB, (RBI) 


0.76 ±0.02 570 ±80 


138 


280 1080 7.822 3.864 


NBA, (Pts) 


0.69 ± 0.02 7840 ± 760 


2408 4345 16854 7.001 3.879 


NBA, (Reb) 


0.69 ± 0.02 3500 ± 130 


1100 1967 7630 6.935 3.880 


Professional League, 


Least-square values 


gamma distribution values 


(opportunities) 




(x) 


<7x x * f^y 


KBB, (AB) 


0.78 ± 0.02 2600 ± 320 


575 


1217 4695 8.164 3.855 


MLB, (AB) 


0.77 ± 0.02 5300 ± 870 


1201 


2515 9702 8.079 3.858 


MLB, (IPO) 


0.72 ±0.02 3400 ±240 


950 


1792 6943 7.308 3.874 


KBB, (IPO) 


0.69 ± 0.02 2800 ± 160 


841 


1520 5894 7.012 3.879 


NBA, (Min) 


0.64 ± 0.02 20600 ± 1900 


7653 


12564 48841 6.382 3.887 


UK, (G) 


0.56 ± 0.02 138 ± 14 


61 


92 359 5.839 3.895 



Least-square values 

Academic Journal, 



(career length in years) 




a 






Nature 


0.38 


± 


0.03 


9.1 ±0.2 


PNAS 


0.30 


± 


0.02 


9.8 ± 0.2 


Science 


0.40 


± 


0.02 


8.7 ± 0.2 


CELL 


0.36 


± 


0.05 


6.9 ± 0.2 


NEJM 


0.10 


± 


0.02 


10.7 ± 0.2 


PRL 


0.31 


± 


0.04 


9.8 ± 0.3 
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